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1  Summary 

The  purpose  of  this  program  is  to  address  the  development  of  algorithms  for  adaptive 
processing  of  multi-sensor  data,  employing  feedback  to  optimize  the  linkage  between 
observed  data  and  sensor  control.  The  envisioned  multi-modal  adaptive  system  is 
applicable  for  intelligence,  surveillance,  and  reconnaissance  (ISR)  in  general 
environments,  addressing  base  and  port  security.  Leveraging  our  previously  developed 
technology,  SIG  is  developing  second-generation  methods  to  adaptively  learn  the 
statistics  of  dynamic  object  behavior  in  video,  while  focusing  on  defining  system 
requirements  for  sensor  deployment  by  using  field  data  (vs.  highly  controlled  indoor 
data).  SIG  is  also  working  closely  with  its  subcontractor,  Lockheed  Martin,  to  integrate 
additional  technologies,  such  as  object  classification  and  recognition,  to  provide  a  more 
robust  and  discriminative  system. 

SIG  is  aggressively  pursing  follow-on  funding  and  technology  transition  opportunities  for 
persistent  surveillance  applications.  In  particular,  under  related  IRAD  efforts,  we  are 
working  to  address  current  shortfalls  and  develop  new  methods  for  activity  recognition  of 
vehicles  and  dismounts  in  persistent  airborne  EO/IR  video  imagery.  An  efficient 
framework  is  under  development  for  joint  classification  and  target  tracking  (JCT).  The 
joint  posterior  density  over  target  poses  (e.g.  position,  velocity,  heading)  and  target  type 
is  recursively  estimated  via  a  Bayesian  formulation.  It  is  assumed  that  appearance 
measurements  provide  information  indicative  of  target  class  and  kinematic  measurements 
for  indirect  measure  of  target  pose.  An  importance  sampling  approach  is  used  to 
efficiently  incorporate  both  types  of  measurements  in  refining  estimation  of  the  joint 
distribution.  The  methods  are  presented  in  the  context  of  providing  real-time  analytic 
metadata  to  reduce  data  transmission  requirements  in  support  of  persistent  video-based 
surveillance.  Included  in  this  metadata  are  the  joint-posterior  distributions  for  target 
tracks  and  class  which  are  utilized  by  an  active  learning  cueing  management  framework 
to  optimally  task  the  appropriate  sensor  modality  to  cued  regions  of  interest.  Moreover, 
this  active  learning  approach  also  facilitates  analyst  cueing  to  help  resolve  track 
ambiguities  in  complex  scenes.  We  intend  to  leverage  SIG’s  active  learning  with 
analyst  cueing  under  future  efforts  with  ONR  and  other  DoD  agencies.  Obtaining  long¬ 
term  accurate  target  tracks  is  a  key  requirement  for  activity  modeling.  We  propose 
adapting  methods  for  activity  manifold  modeling,  such  that  the  posterior  distributions  of 
vehicle  tracks  themselves  may  be  ultimately  transformed  into  space-time  probability 
manifolds.  Such  representations  will  enable  further  application  of  the  active  learning 
framework  for  semi- automated  activity  recognition  with  limited  analyst  cueing  for 
anomaly  and  threat  notification. 

2  Recent  Technical  Developments 

During  the  recent  performance  period  we  have  made  significant  progress  in  developing  a 
compressive  sensing  model  to  allow  object  detection  and  tracking  directly  in  the 
compressed  domain  before  image  reconstruction. 
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2.1  Compressive  sampling 

The  SIG  C2CS  effort  at  object  tracking  as  seen  in  the  previous  sections  has  been  shown  to 
provide  an  effective  approach  for  data  compression  if  only  moving  foreground  objects  are  of 
interest  in  a  video  scene.  This  would  be  appropriate  for  applications  where  the  goal  is  to 
detect  &  track  objects  as  well  as  to  detect  anomalous  object  behaviors.  In  such  application, 
we  could  clearly  compress  high  rate  data  streams  based  on  selective  representation  of 
objects  and  background  to  send  only  data  relevant  to  tracking  objects. 

The  concept  of  compressive  sensing  (CS)  also  allows  information  to  be  extracted  from 
sensor  data  (including  video  image  sequences)  using  significantly  fewer  samples  relative 
to  conventional  uniform  sampling  techniques.  As  indicated  in  our  Year  III  planning  letter, 
this  related  technology  has  been  the  subject  of  an  effort  to  examine  the  approach  of 
combining  CS  approaches  to  track  objects  in  time-multiplexed  video  imagery. 

2.2  Applications 

In  DoD  surveillance  applications,  the  Field-of-Regard  (FOR)  has  become  too  large  for  a 
single  imaging  sensor  to  capture  the  activities  for  timely  exploitation.  Using  a  scanning 
imaging  system  will  require  large  areas  of  the  FOR  to  go  un-sensed  while  the  scanner  is 
imaging  another  area.  Multiple  sensors  or  platforms  will  result  in  a  possibly 
unmanageable  data  glut  and  require  extra  processing  (such  as  sensor  registration  or 
mosaicing).  For  image-based  Automatic  Target  Recognition  (ATR)  applications  one  has 
the  distinct  possibility  that  there  is  significant  mismatch  between  what  is  being  sensed, 
and  what  is  optimal  for  the  ATR  to  make  a  decision.  For  example,  the  imaging  sensor  is 
perhaps  wasting  photon  collection  resources  to  image  irrelevant  and  confusing  clutter 
information  when  more  photon  collection  on  the  target  of  interest  could  increase  the  ATR 
performance.  One  would  ideally  want  to  use  the  imaging  sensor  to  help  determine  the 
current  position  of  the  target  at  time  samples  determined  by  the  object’s  velocity;  while 
imaging  other,  static,  portions  of  the  scene  at  a  lower  sampling  rate. 

2.3  Sampling  Approach 

Figure  1  shows  the  basic  approach  to  compressive  imaging  that  we  are  investigating  for 
object  detection  and  tracking.  Here,  the  traditional  sampling  approach  is  shown  as  an 
array  of  128x128  pixels  where  each  pixel  is  integrated  over  a  period  of  128  high 
resolution  time  steps.  The  assumption  is  that  there  might  be  some  object  motion  that 
might  be  detectable  by  the  high  resolution  sensor,  but  that  using  traditional  single-pixel 
integration  will  fail  to  resolve  the  moving  objects.  In  the  proposed  compressive  sampling 
approach,  however,  each  output  value  is  not  a  single  integrated  pixel,  but  instead  a  liner 
combination  of  128  values  at  each  time  interval  is  created  to  form  128  “super-pixels”  at 
each  time  step.  These  values  collected  at  each  of  the  time  steps  can  them  be  processed  to 
reconstruct  details  (e.g.  moving  objects)  that  could  not  be  resolved  using  the  traditional 
single  pixel  integration  approach. 


3 


Quarterly  Status  Report  N00014-05-C-0294 


August  2008 


Figure  1:  Left  -  The  traditional  methodology  for  sampling  a  video  frame.  We  integrate  the  photons  along  the 
red  rectangular  elements  to  arrive  at  a  pixel  intensity  valne.  Right  -  A  compressive  sampling  architectnre. 

The  collections  of  all  same  colored  “pixels”  are  Integrated  together  as  a  large  “super-Pixel”.  There  are  N  super- 
Pixels  collected  at  N  different  time  intervals.  The  spatial  distrihutlon  of  the  same  colored  pixels  are  randomized. 

2.4  Initial  Results 


Figure  1;  Top  Left:  The  information  scene  hy  a  traditional  camera.  Top  Right:  the  vector  of  data  collected 
during  the  compressive  sampling  (linear  function  of  each  image).  Bottom  left:  The  ground  truth  video.  This  is 
unohservahle  to  the  system  as  we  are  assuming  the  motion  is  too  fast  from  traditional  sampling.  Bottom  right: 
The  successive  estimates  of  the  difference  Imagery  collected  and  estimated. 


Figure  2  shows  an  initial  result  of  the  eompressive  imaging  approach  whereby  the 
difference  between  successive  images  is  found  before  image  reconstruction.  Note  that 
each  128x128  pixel  image  is  compressively  samples  by  only  128  samples  at  each  frame. 

3  Future  Directions 

During  the  next  reporting  period,  we  will  continue  work  on  using  the  concepts  of 
compressive  sensing  to  address  object  detection  and  tracking.  We  will  continue  the  work 
on  transition  of  the  C2CS  technology  for  airborne  persistent  surveillance  applications,  as 
well  as  for  integration  with  systems  that  use  analyst-in-the-loop  technology  for  analyst 
cueing.  We  will  also  prepare  the  final  report  for  the  SIG  C2CS  effort  to  close  out  the 
contract  period. 


4 


